智能论文笔记

Automated analysis of fibrous cap in intravascular optical coherence tomography images of coronary arteries

Juhwan Lee , Gabriel T. R. Pereira , Yazan Gharaibeh , Chaitanya Kolluru , Vladislav N. Zimin , Luis A. P. Dallan , Justin N. Kim , Ammar Hoori , Sadeer G. Al-Kindi , Giulio Guagliumi

分类：机器学习 | 计算机视觉

2022-04-21

Thin-cap fibroatheroma (TCFA) and plaque rupture have been recognized as the most frequent risk factor for thrombosis and acute coronary syndrome. Intravascular optical coherence tomography (IVOCT) can identify TCFA and assess cap thickness, which provides an opportunity to assess plaque vulnerability. We developed an automated method that can detect lipidous plaque and assess fibrous cap thickness in IVOCT images. This study analyzed a total of 4,360 IVOCT image frames of 77 lesions among 41 patients. To improve segmentation performance, preprocessing included lumen segmentation, pixel-shifting, and noise filtering on the raw polar (r, theta) IVOCT images. We used the DeepLab-v3 plus deep learning model to classify lipidous plaque pixels. After lipid detection, we automatically detected the outer border of the fibrous cap using a special dynamic programming algorithm and assessed the cap thickness. Our method provided excellent discriminability of lipid plaque with a sensitivity of 85.8% and A-line Dice coefficient of 0.837. By comparing lipid angle measurements between two analysts following editing of our automated software, we found good agreement by Bland-Altman analysis (difference 6.7+/-17 degree; mean 196 degree). Our method accurately detected the fibrous cap from the detected lipid plaque. Automated analysis required a significant modification for only 5.5% frames. Furthermore, our method showed a good agreement of fibrous cap thickness between two analysts with Bland-Altman analysis (4.2+/-14.6 micron; mean 175 micron), indicating little bias between users and good reproducibility of the measurement. We developed a fully automated method for fibrous cap quantification in IVOCT images, resulting in good agreement with determinations by analysts. The method has great potential to enable highly automated, repeatable, and comprehensive evaluations of TCFAs.

translated by 谷歌翻译

MAViL: Masked Audio-Video Learners

Po-Yao Huang , Vasu Sharma , Hu Xu , Chaitanya Ryali , Haoqi Fan , Yanghao Li , Shang-Wen Li , Gargi Ghosh , Jitendra Malik , Christoph Feichtenhofer

分类：计算机视觉

2022-12-15

We present Masked Audio-Video Learners (MAViL) to train audio-visual representations. Our approach learns with three complementary forms of self-supervision: (1) reconstruction of masked audio and video input data, (2) intra- and inter-modal contrastive learning with masking, and (3) self-training by reconstructing joint audio-video contextualized features learned from the first two objectives. Pre-training with MAViL not only enables the model to perform well in audio-visual classification and retrieval tasks but also improves representations of each modality in isolation, without using information from the other modality for fine-tuning or inference. Empirically, MAViL sets a new state-of-the-art on AudioSet (53.1 mAP) and VGGSound (67.1% accuracy). For the first time, a self-supervised audio-visual model outperforms ones that use external supervision on these benchmarks. Code will be available soon.

translated by 谷歌翻译

A Computer Vision Method for Estimating Velocity from Jumps

Soumyadip Roy , Chaitanya Roygaga , Nathaniel Blanchard , Aparna Bharati

分类：计算机视觉

2022-12-09

Athletes routinely undergo fitness evaluations to evaluate their training progress. Typically, these evaluations require a trained professional who utilizes specialized equipment like force plates. For the assessment, athletes perform drop and squat jumps, and key variables are measured, e.g. velocity, flight time, and time to stabilization, to name a few. However, amateur athletes may not have access to professionals or equipment that can provide these assessments. Here, we investigate the feasibility of estimating key variables using video recordings. We focus on jump velocity as a starting point because it is highly correlated with other key variables and is important for determining posture and lower-limb capacity. We find that velocity can be estimated with a high degree of precision across a range of athletes, with an average R-value of 0.71 (SD = 0.06).

translated by 谷歌翻译

Spatial Relation Graph and Graph Convolutional Network for Object Goal Navigation

D. A. Sasi Kiran , Kritika Anand , Chaitanya Kharyal , Gulshan Kumar , Nandiraju Gireesh , Snehasis Banerjee , Ruddra dev Roychoudhury , Mohan Sridharan , Brojeshwar Bhowmick , Madhava Krishna

分类：机器人 | 人工智能

2022-08-27

本文描述了对象目标导航任务的框架，该任务要求机器人从随机的启动位置查找并移至目标对象类的最接近实例。该框架使用机器人轨迹的历史记录来学习空间关系图（SRG）和图形卷积网络（GCN）基于基于不同语义标记区域的可能性以及这些区域不同对象类别的发生的可能性。为了在评估过程中定位目标对象实例，机器人使用贝叶斯推理和SRG估计可见区域，并使用学习的GCN嵌入来对可见区域进行排名，并选择接下来的区域。

translated by 谷歌翻译

HTML版本

Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

Dong Won Lee , Chaitanya Ahuja , Paul Pu Liang , Sanika Natu , Louis-Philippe Morency

分类：人工智能 | 自然语言处理 | 计算机视觉 | 机器学习

2022-08-17

仔细构建和介绍了一系列包含文本和数字的页面，这些页面是一系列页面，并仔细构建并呈现，以便将知识最佳地转移给学生。先前在多媒体和心理学方面的研究将演讲的有效性归因于其多模式的性质。为了开发AI的一步，以帮助学生学习作为智能教师助理，我们将多模式演讲演示文稿数据集作为大规模的基准测试，以测试机器学习模型在多模式了解教育内容的能力。我们的数据集包含一个对齐的幻灯片和口语，用于180多个小时的视频和9000多个幻灯片，其中10位来自各种主题的讲师（例如，计算机科学，牙科，生物学）。我们介绍了两项研究任务，它们被设计为对AI代理商的垫脚石，这些阶梯可以解释（自动为演讲演示字幕），并说明（综合视觉图形以伴随口语解释）教育内容。我们提供手动注释，以帮助执行这两项研究任务并评估其最新模型。比较基线和人类学生的表现，我们发现当前模型在（1）幻灯片和口语文本之间的较弱的跨模式对齐中挣扎，（2）学习新颖的视觉介质，（3）技术语言和（4）（4）远程序列。为了解决这个问题，我们还引入了Polyvilt，这是一种多模式变压器，经过多种模式的学习损失，比目前的方法更有效。最后，我们阐明了对教育演示的多模式理解的挑战和机遇。

translated by 谷歌翻译

Entity Anchored ICD Coding

Jay DeYoung , Han-Chin Shing , Luyang Kong , Christopher Winestock , Chaitanya Shivade

分类：机器学习 | 自然语言处理

2022-08-15

医疗编码是一项复杂的任务，需要将超过72,000个ICD代码的子集分配给患者的笔记。对这些任务的现代自然语言处理方法已受到输出空间的输入和大小的长度挑战。我们将模型输入限制在文档中发现的医疗实体周围的一个小窗口中。从这些本地上下文中，我们构建了ICD代码和实体的上下文化表示，并汇总这些表示形式以形成文档级预测。与现有的方法相反，该方法使用使用大小或训练中的代码固定的表示形式，我们通过用本地上下文编码代码描述来表示ICD代码。我们讨论适合在实践中部署编码系统的指标。我们表明，我们的方法优于标准和可部署措施的现有方法，包括在稀有和看不见的代码上的性能。

translated by 谷歌翻译

Learning Modular Structures That Generalize Out-of-Distribution

Arjun Ashok , Chaitanya Devaguptapu , Vineeth Balasubramanian

分类：机器学习 | 人工智能

2022-08-07

对于现实世界的机器学习系统，分发（O.O.D.）的概括仍然是一个关键挑战。我们描述了O.O.D.的方法通过培训，概括鼓励模型仅保留网络中的功能，这些功能在多个培训领域都充分利用。我们的方法将两个互补的神经元级正则化剂与网络上的概率可区分二进制掩码相结合，以提取一个模块化子网络，从而实现更好的O.O.D.性能比原始网络。两个基准数据集的初步评估证实了我们方法的承诺。

translated by 谷歌翻译

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Elham E Khoda , Dylan Rankin , Rafael Teixeira de Lima , Philip Harris , Scott Hauck , Shih-Chieh Hsu , Michael Kagan , Vladimir Loncar , Chaitanya Paikara , Richa Rao

分类：机器学习 | (统计)机器学习

2022-07-01

复发性神经网络已被证明是高能量物理中许多任务的有效体系结构，因此已被广泛采用。然而，由于在现场可编程门阵列（FPGAS）上实现经常性体系结构的困难，它们在低延迟环境中的使用受到了限制。在本文中，我们介绍了HLS4ML框架内两种类型的复发性神经网络层（长期短期内存和封闭式复发单元）的实现。我们证明，我们的实施能够为小型和大型模型生产有效的设计，并且可以定制以满足推理潜伏期和FPGA资源的特定设计要求。我们显示了多个神经网络的性能和合成设计，其中许多是专门针对CERN大型强子对撞机的喷气识别任务的培训。

translated by 谷歌翻译

Spherical Channels for Modeling Atomic Interactions

C. Lawrence Zitnick , Abhishek Das , Adeesh Kolluru , Janice Lan , Muhammed Shuaibi , Anuroop Sriram , Zachary Ulissi , Brandon Wood

分类：机器学习

2022-06-29

建模原子系统的能量和力是计算化学中的一个基本问题，有可能帮助解决世界上许多最紧迫的问题，包括与能源稀缺和气候变化有关的问题。这些计算传统上是使用密度函数理论进行的，这在计算上非常昂贵。机器学习有可能从天数或小时到秒从天数大幅提高这些计算的效率。我们建议球形通道网络（SCN）对原子能量和力进行建模。 SCN是一个图神经网络，节点代表原子并边缘其相邻原子。原子嵌入是使用球形谐波表示的一组球形函数，称为球形通道。我们证明，通过基于3D边缘方向旋转嵌入式，可以在保持消息的旋转模糊性的同时使用更多信息。虽然均衡性是理想的属性，但我们发现，通过在消息传递和聚合中放松这种约束，可以提高准确性。我们在大规模开放催化剂2020数据集中展示了最新的结果，这些数据集在能源和力量预测中，用于许多任务和指标。

translated by 谷歌翻译

The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysis

Richard Tran , Janice Lan , Muhammed Shuaibi , Siddharth Goyal , Brandon M. Wood , Abhishek Das , Javier Heras-Domingo , Adeesh Kolluru , Ammar Rizvi , Nima Shoghi

分类：机器学习

2022-06-17

计算催化和机器学习社区在开发用于催化剂发现和设计的机器学习模型方面取得了长足的进步。然而，跨越催化的化学空间的一般机器学习潜力仍然无法触及。一个重大障碍是在广泛的材料中获得访问培训数据的访问。缺乏数据的一类重要材料是氧化物，它抑制模型无法更广泛地研究氧气进化反应和氧化物电催化。为了解决这个问题，我们开发了开放的催化剂2022（OC22）数据集，包括62,521个密度功能理论（DFT）放松（〜9,884,504个单点计算），遍及一系列氧化物材料，覆盖范围，覆盖率和吸附物（ *H， *o， *o， *o， *o， *o， * n， *c， *ooh， *oh， *oh2， *o2， *co）。我们定义广义任务，以预测催化过程中适用的总系统能量，发展几个图神经网络的基线性能（Schnet，Dimenet ++，Forcenet，Spinconv，Painn，Painn，Gemnet-DT，Gemnet-DT，Gemnet-OC），并提供预先定义的数据集分割以建立明确的基准，以实现未来的努力。对于所有任务，我们研究组合数据集是否会带来更好的结果，即使它们包含不同的材料或吸附物。具体而言，我们在Open Catalyst 2020（OC20）数据集和OC22上共同训练模型，或OC22上的微调OC20型号。在最一般的任务中，Gemnet-OC看到通过微调来提高了约32％的能量预测，通过联合训练的力预测提高了约9％。令人惊讶的是，OC20和较小的OC22数据集的联合培训也将OC20的总能量预测提高了约19％。数据集和基线模型是开源的，公众排行榜将遵循，以鼓励社区的持续发展，以了解总能源任务和数据。

translated by 谷歌翻译